Skip to content

Conversation

@mzuenni
Copy link
Collaborator

@mzuenni mzuenni commented Nov 17, 2025

solves #312
@thorehusfeldt do you mind adjusting the schemas?

@mzuenni mzuenni requested a review from mpsijm November 17, 2025 21:58
@mzuenni mzuenni marked this pull request as ready for review November 19, 2025 16:22
@RagnarGrootKoerkamp
Copy link
Owner

Should we also allow this on the answer files? to ensure a testcase is (im)possible as intended.

@thorehusfeldt
Copy link
Collaborator

Consider bumping the generator framework version.

The sample generators.yaml script linked from doc might want to include

version: 2025-12 

and the default in the CUE schema generators key (and presumably the JSON file) updated accordingly.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 3, 2025

@RagnarGrootKoerkamp :

Should we also allow this on the answer files?

But ans: possible and ans: impossible can already be specified. I guess what you are proposing would be to support “make sure the answer is not impossible", for instance (by

ans-match: \d+

Hm. I find this useful, but now it’s getting ugly.

An idea for syntax that is consistent with the current proposal:

match:
 in: foo
 ans: bar
---
match:
  in: [42, forty-two] 
  ans: bar
---
match: \w+\s\w+ # same as match: { in: \w+\s\w+ }
---
# same as match: { in: [42, forty-two] }:
match:
  - 42
  - forty-two

I guess the schema is

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 3, 2025

even though this might be less yaml like i would prefer to not nest these and go for something like in.match or match.in?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

Just be make sure I was clear: I propose to retain

match: \d+

as a valid expression, and expect it to be the widest-used form. I propose that the above is the same as

match:
  in: \d+

(which, thanks to standard YAML syntax, can also be written as a one-liner, match: { in: \d+ }, but which is not the same as match.in: \d+. )

The situation in which the “mapping” form would mainly arise is when you want to specify something about ans, like “the answer is not impossible”. Not sure what kinds of conventions will arise among authors, but here are some suggestions:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* }

I would advise against introducing more keys in the top-level mapping (such asmatch, match.in, and match.ans); tool support for YAML is just better when we stick to YAML conventions.


The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

[in|ans]: string | {
  value: string
  pattern: string | [...string]
}

so you’d have expressions like this:

generate: make_random_tree -n 100 --balanced {seed:0}
in:
  pattern: \d+
ans: impossible

This doesn’t smell right to me, but it’s just a hunch.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

I notice that we already have a plethora of stuff, namely

["in" | "in.statement" | "in.download" |
    "ans" | "ans.statement" | "ans.download" |
    "out"]: string

The current semantics is that the key: value pair means "<testcasename>.<key> must equal value". What we’re looking for in the current proposal is a semantics that says "<testcasename>.<key> should obey constraint".

This is a case against introducing keys like in.match, by the way. You’d need ans.statement.match etc.

My hunch is that the cleanest way is to enrich the right-hand side, instead of introducing more left-hand sides of such expressions.

I think what I’m saying is

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string # as we have now
match: string | { [extension]: string } # default string same as { in: string }

allowing

in: foo
match:
  ans.statement: \d\w+ 

Alternatively,

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string  | { match: string }
in: foo
ans.statement:
  match: \d\w+ 

Dream state

The dream state would be what CUE already supports out-of-the box:

ans: "impossible"  # ans must equal impossible
---
in: number & >0 # in must be a number, and strictly larger than 0
---
ans: "yes" | "no" # ans  must be either "yes" or "no"
in.statement: =~"^\w\w$" # in.statement has two letters
in: !~"impossible" # in does not contain impossible
in: in.statement # in and in.statement are identical

In other words, there’s a whole grammar on the right hand side supporting |, &, literal match, and =~ and !~ for regex match and unmatch.

Note that explicit creation and constraint checking are the same: CUE just unifies everything it knows about, say .in (including whatever copy or generate may have produced) and expects the result to be a singleton. Otherwise it complains. Specifying a constraint is the same as specifying a value (the latter is just a constraint with a singleton valid instantiation.)

This would be sah-weet!

@RagnarGrootKoerkamp
Copy link
Owner

Interesting idea to do in: {match: ...}, sounds reasonable as well to me, but no strong opinion either way.

Should it be matches instead of match maybe? As in the .ans matches X Y Z.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

I don't like that, also feels weird in combination with generated testcases...

["in" | "in.statement" | "in.download" |
   "ans" | "ans.statement" | "ans.download" |
   "out"]: string

I dont think we need this for something else as .in and .ans since this is only intended to additionally check generated files. The others are already hardcoded typically?

and =~ and !~ for regex match and unmatch.

Unmatch would certainly be nice...

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

I am fine with that, even though I like to not nest things... :D

The question is if/how we want to support unmatch than?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

The question is if/how we want to support unmatch than?

The current proposal already supports “unmatching”, since regexen support that. Here are the three examples from upthread again, for a problem with output impossible or some numbers:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* } 

CUE of course would make this nicer to look at:

ans: !~"impossible"

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

match: { ans: ^(?!impossible$).* }

I don't think that one is right? (\A(?!.*^impossible$).*\Z would work but is not very nice...) we could say that if the string starts with ! we do unmatch and if it starts with = we do a match

@thorehusfeldt
Copy link
Collaborator

The only thing I’m unsure about for my negative lookahead regex is what do to with a possibly trailing newline. (I don’t understand the specification well enough.) So maybe it should be ^(?!impossible).* Otherwise I’m pretty sure it’s fine.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

I think the issue is that we don't do a full match but a search, and you pattern still matches a suffix not containing impossible? Anyway: yes it can be expressed... the question is do we want simpler syntax for this? ^^'

@thorehusfeldt
Copy link
Collaborator

we don't do a full match but a search

Now I understand. That’s what ^ is for in my expression. (You prefer \A.)

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

but ^ matches any start of line \A matches start of the first line

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 5, 2025

Hear me out. This actually works:

  1. Syntax

Do allow certain CUE-expressions as the right hand sides of in:, ans:, etc. To be precise, allow string expressions.

For instance, we can do

in: "impossible" # just like we always have
---
in: =~"\\d\\d" # two digits
---
in: "foo"  | "bar"
---
in: "^[a-z]+$" & !~"^impossible$" # alphabetic word, but not impossible 

and a thousand other things. CUE is quite expressive. The main use case are disjuntions and regex match and unmatch.

What is new is that the right hand side is now a constraint. If no in-key is present, it defaults to in: string.

  1. Semantics

For a generator rule, various files can be created. generate, copy, or the default submissions producing ans.

Whatever has been produced (maybe nothing) is now unified using CUE and produce a concrete value (i.e., a concrete string). In the simplest case, the expression

ans: "impossible"

means that “the output of the default submissions will be unified with impossible. In this special case, this means that the two string need to be the same. This is exactly the behaviour that we already have.

But if we had

ans: "yes" | "maybe"

the output of the default submission could be yes or maybe, since both those string unifify with the ans-expression.

  1. Implementation

The CUE CLI already does this. You can set up a very small CUE snippet:

input: string   // will be filled from CLI with concrete value
expr: "foo" | "bar"  // the value of an ans-key in generators.yaml
ok:   input & expr

cue cmd --inject input=foobarbaz exactly replaces input with "foobarbaz", and then CUE does it magic by trying to unify ok. The result of the command is either an error (in this case it would be because "foobarbaz" does not satisfy the rule ), or the unified string.

The only reason to not do this is that it increases the dependencies of BAPCtools. (Which is a good enough reason, I think.)

Still, cool AF. Backwards compatible.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 5, 2025

I actually don't understand what you want to suggest? ^^'

suppose one of your examples is the actual generators.yaml:

data:
  secret:
    - testcase:
        generate: gen.py
        in: "foo"  | "bar"

what is supposed to happen (in our implementation)? do we first run cue on the yaml? do we parse the yaml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants